TAT: An Author Profiling Tool with Application to Arabic Emails
نویسندگان
چکیده
This paper reports on the application of the Text Attribution Tool (TAT) to profiling the authors of Arabic emails. The TAT system has been developed for the purpose of language-independent author profiling and has now been trained on two email corpora, English and Arabic. We describe the overall TAT system and the Machine Learning experiments resulting in classifiers for the different author traits. Predictions for demographic and psychometric author traits show improvements over the baseline for some of the author traits with both the English and the Arabic data. Arabic presents particular challenges for NLP and this paper describes more specifically the text processing components developed to handle Arabic emails.
منابع مشابه
Author Profiling for English Emails
This paper reports on some aspects of a project aimed at automating the analysis of texts for the purpose of author profiling and identification. The complete analysis provides probabilities for the author’s basic demographic traits (gender, age, geographic origin, level of education and native language) as well as for five psychometric traits. We describe the email data which was collected for...
متن کاملHigh capacity steganography tool for Arabic text using 'Kashida'
Steganography is the ability to hide secret information in a cover-media such as sound, pictures and text. A new approach is proposed to hide a secret into Arabic text cover media using "Kashida", an Arabic extension character. The proposed approach is an attempt to maximize the use of "Kashida" to hide more information in Arabic text cover-media. To approach this, some algorithms have been des...
متن کاملEvolving Fuzzy Neural Network for Phishing Emails Detection
One of the broadly used internet attacks to deceive customers financially in banks and agencies is unknown “zero-day” phishing Emails “zero-day” phishing Emails is a new phishing email that it has not been trained on old dataset, not included in black list. Accordingly, the current paper seeks to Detection and Prediction of unknown “zero-day” phishing Emails by provide a new framework called Ph...
متن کاملA Document Weighted Approach for Gender and Age Prediction Based on Term Weight Measure
Author profiling is a text classification technique, which is used to predict the profiles of unknown text by analyzing their writing styles. Author profiles are the characteristics of the authors like gender, age, nativity language, country and educational background. The existing approaches for Author Profiling suffered from problems like high dimensionality of features and fail to capture th...
متن کاملArabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents
Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...
متن کامل